Search Results

Documents authored by Schulz, Klaus U.


Document
Information Access to Historical Documents from the Early New High German Period

Authors: Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz, and Christiane Wanzeck

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we comment on the difficulties that result for retrieval and data mining on historical texts. In the second part of the paper we report on our own work in the area with a focus on special matching strategies that help to relate modern language keywords with old variants. The basis of our studies is a collection of documents from the Early New High German period. These texts come with a very rich spectrum on word variants and spelling variations.

Cite as

Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz, and Christiane Wanzeck. Information Access to Historical Documents from the Early New High German Period. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{hauser_et_al:DagSemProc.06491.9,
  author =	{Hauser, Andreas and Heller, Markus and Leiss, Elisabeth and Schulz, Klaus U. and Wanzeck, Christiane},
  title =	{{Information Access to Historical Documents from the Early New High German Period}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.9},
  URN =		{urn:nbn:de:0030-drops-10573},
  doi =		{10.4230/DagSemProc.06491.9},
  annote =	{Keywords: Historical documents, information access, Early New High German, historical language, information retrieval, word similarity, approximate matching}
}
Document
Node Identification Schemes for Efficient XML Retrieval

Authors: Felix Weigel, Klaus U. Schulz, and Holger Meuss

Published in: Dagstuhl Seminar Proceedings, Volume 5061, Foundations of Semistructured Data (2005)


Abstract
Node identifiers (IDs) encoding part of the tree structure in XML documents can save I/O for table look-ups, thus speeding up the evaluation of path and tree queries on large persistent document collections. In particular, binary tree relations such as the extended XPath axes can be either decided for a given pair of node IDs, or reconstructed for a single node ID, without access to secondary storage. Several ID schemes have been proposed so far, which differ with respect to (1) expressiveness, i.e. which relations can be decided or reconstructed from IDs, (2) the runtime performance and asymptotic behaviour of decision and reconstruction operations, (3) the storage overhead for the IDs, and (4) robustness, i.e. behaviour in the presence of updates. First we review five ID schemes, positioning them in the trade-off between these four comparison criteria. Then a new ID scheme called BIRD, for Balanced Index-based ID scheme for Reconstruction and Decision, is introduced and illustrated throughout several examples of decision and reconstruction operations on IDs. We argue that emphasizing runtime performance and expressive power, BIRDs strategy in the above trade-off is best for many applications, especially where storage minimization is not the primary goal and updates occur in a bulk-fashion rather than in realtime. Our experimental results on document collections of up to one gigabyte prove BIRD to be most efficient in terms of expressiveness and runtime performance. Most notably, BIRD is the only scheme to support both decision and reconstruction of many relations in constant time. But also in terms of storage and robustness BIRD is highly competitive.

Cite as

Felix Weigel, Klaus U. Schulz, and Holger Meuss. Node Identification Schemes for Efficient XML Retrieval. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{weigel_et_al:DagSemProc.05061.6,
  author =	{Weigel, Felix and Schulz, Klaus U. and Meuss, Holger},
  title =	{{Node Identification Schemes for Efficient XML Retrieval}},
  booktitle =	{Foundations of Semistructured Data},
  pages =	{1--23},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5061},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.05061.6},
  URN =		{urn:nbn:de:0030-drops-2292},
  doi =		{10.4230/DagSemProc.05061.6},
  annote =	{Keywords: node identification scheme, labelling scheme, numbering scheme, naming scheme, tree encoding, BIRD}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail